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Abstract 

Adapted  wavelet  analysis  of  signals  is  achieved  by  op¬ 
timizing  a  selected  criterion.  We  recently  introduced 
a  majorization  framework  for  constructing  selection 
functionals,  which  can  be  as  well  suited  to  compres¬ 
sion  as  entropy  or  others.  We  show  how  these  func¬ 
tionals  operate  on  the  basis  selection  and  their  effect 
on  the  statistics  of  the  resulting  representation. 


1  Introduction 

Multiscale  analysis  has  permeated  most  applied  sci¬ 
ence  and  engineering  applications  largely  on  account 
of  its  simple  and  efficient  implementation.  In  addi¬ 
tion  it  provides  a  highly  flexible  adaptive  framework 
using  Wavelet  Packet  (WP)  and  local  trigonometric 
dictionaries  [1,  2,  3].  The  remarkable  impact  it  has 
had  on  signal  processing  applications  is  reflected  by 
the  vibrant  interest  from  the  basic/applied  research 
communities  in  its  apparently  naturally  suited  frame¬ 
work  for  signal  compression  [4].  Adapted  wavelet  rep¬ 
resentations  have  further  raised  enthusiasm  in  provid¬ 
ing  a  perhaps  optimal  and  yet  efficiently  achievable 
transform  domain  for  compression  (merely  via  a  se¬ 
lection  criterion). 

Various  criteria  for  optimizing  adapted  representa¬ 
tions,  have  been  proposed  in  the  literature  [5,  6,  7], 
the  first  and  perhaps  the  best  known  being  the  en¬ 
tropy  criterion.  This  was  proposed  on  the  basis  that 
the  most  preferable  representation  for  a  given  sig¬ 
nal  is  that  which  is  the  most  parsimonious,  i.e.  that 
which  compresses  the  energy  into  the  fewest  number 
of  basis  function  coefficients.  We  have  recently  recast 
the  search  for  an  optimized  wavelet  basis  into  a  ma¬ 
jorization  theoretic  framework  and  briefly  described 
later  [8].  This  framework  not  only  makes  the  con¬ 
struction  of  new  criteria  simple,  but  raises  questions 
about  their  physical  interpretation  and  their  impact 
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on  the  statistics  of  the  resulting  representation  as 
well.  While  the  first  question  was  addressed  and  an¬ 
swered  quite  satisfactorily  [8],  the  second,  to  the  best 
of  our  knowledge  remains  open.  To  address  this  issue, 
we  view  the  basis  search  as  an  optimization  of  a  func¬ 
tional  over  a  family  of  probability  density  functions 
which  result  from  the  various  possible  representations 
of  the  WP  dictionary.  We  show  that  for  an  appro¬ 
priately  selected  optimization  (or  cost)  criterion,  the 
resulting  Probability  Density  Function  (PDF)  of  the 
coefficients  for  the  optimized  representation  will  de¬ 
crease  rapidly  (at  least  as  fast  as  linearly). 

In  the  next  section,  we  present  some  relevant  back¬ 
ground  as  well  as  the  problem  formulation.  In  Section 
3  we  present  the  analysis  of  the  optimization  leading 
to  an  adapted  wavelet  basis  of  a  given  signal  y{t).  In 
Section  4  we  provide  some  illustrative  examples. 


2  Background  and  Formulation 

2.1  Best  Basis  Representations 

The  determination  of  the  “best  representation”  or 
Best  Basis  (BB)  of  a  signal  in  a  wavelet  packet  or  Mal- 
var’s  wavelet  basis  generally  relies  on  the  minimiza¬ 
tion  of  an  additive  criterion.  The  entropy  is  usually 
retained  as  a  cost  function  but,  as  will  be  shown  later, 
other  criteria  may  be  constructed  to  introduce  an  al¬ 
ternative  viewpoint.  To  obtain  an  efficient  search  of 
the  BB,  the  dictionary  T>  of  possible  bases  is  struc¬ 
tured  according  to  a  binary  tree.  Each  node  {j,m) 
(with  j  £  {0, ...  ,  J}  and  m  €  {0,...  ,2-^  —  1})  of 
the  tree  then  corresponds  to  a  given  orthonormal  ba¬ 
sis  of  a  vector  subspace  of  ^^({1, . . .  ,17}).  An 
orthonormal  basis  of  .^^({1,...  ,K})  is  then  B-p  = 
U(j,TO)//j  where  P  is  &  partition  of  [0, 1[  in 

intervals  =  [2“^m,  2"-'  (m  -f-  1)[.  By  taking  ad¬ 
vantage  of  the  property 

Span{%m}  =  Span{Bj+i,2m}  ®  Span{.Sj-|.i,2m-n}, 

a  fast  bottom-up  tree  search  algorithm  was  developed 
in  [1]  to  optimize  the  partition  V.  The  coefficients 
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of  an  observed  signal  y{t)  are  henceforth  denoted  by 

{Xi}. 

2.2  Majorization  Theoretic  Approach 

We  have  recently  recast  this  BB  search  problem  [8] 
into  the  context  of  majorization  theory  developed  in 
mathematical  analysis  in  the  1930’s  [9].  Evaluating 
two  candidate  representations  for  an  observed  process 
y{t)  in  a  dictionary  of  bases,  entails  a  comparison  of 
two  corresponding  quantitative  measures.  These  can 
in  theory  be  defined  to  reflect  any  desired  specific 
property  of  the  process  [8],  and  thereby  afford  us  to 
generalize  the  class  of  possible  criteria  mentioned  in 
the  previous  section.  This  was  in  fact  inspired  by  an 
effective  mechanism  first  proposed  in  econometry  [10] 
and  later  formalized  and  further  generalized  in  [9]. 

To  compare,  say,  two  vectors  a  and  7  G  K”  (i.e. 
positive  real),  we  could  evaluate  the  spreads  of  their 
components  to  establish  a  property  of  majorization 
of  one  vector  by  the  other.  Let  these  vectors  be  rank 
ordered  in  a  decreasing  manner  and  subsequently  de¬ 
noted  by  {a[j]}  (i.e.  >  an+i\,i  =  we 

then  have, 

Definition  1.  For  cx  and  7  G  K”  ,  we  say  that  a  -< 
7,  or  a.  is  majorized  by  7  if 

Eti  «[fi  <  Ei=i  7[i],  k  =  ,n-l 

En 

Note  that  in  the  case  of  an  entropy-based  BB  search, 
the  comparison  carried  out  on  the  wavelet  packet  co¬ 
efficients  is  similar  to  the  majorization  procedure  de¬ 
scribed  above.  This  theory  has  also  spawned  a  variety 
of  questions  in  regards  to  the  choice  of  functionals  (or 
criteria)  acting  upon  these  vectors  and  preserving  the 
majorization.  Many  properties  have  been  established 
[9]  and  one  which  is  of  central  importance  herein  is 
that  any  optimization  functional  §{•)  we  select,  must 
he  order  preserving,  i.e. 

If  q:  7  =»  g{a.)  <  9(7). 

This  not  only  brings  insight  into  the  problem,  but 
provides  the  impetus  as  well  to  further  study  the  var¬ 
ious  convex/ concave  criteria  typically  invoked  in  the 
optimization. 


2.3  Formulation 

The  criteria  used  in  majorization  are  based  on  us¬ 
ing  isotonic  or  order-preserving  functionals  !(■)  which 
can  be  shown  to  satisfy  Schur  convexity /concavity  ^ 
[9].  In  its  general  form,  a  BB  search  aims  at  then 

^  Schur  convexity /concavity  is  tied  to  convexity /concavity 
and  isotonicity  (or  order-preservation). 


minimizing  a  functional  J{f{x),x),  where  f{x)  rep¬ 
resents  the  common  PDF  of  the  wavelet  coefficients, 
which  are  also  subject  to  constraints.  Formally,  we 
may  state  the  problem  as 

mmj{x,f{x))  =  min  [ [I{f{x))  +  \C  {f{x),x)]dx 

fl^)  fix)  J 

where  C{j  specifies  some  implicit  or  explicit  con¬ 
straints.  Our  focus  in  this  paper  is,  for  a  given  !(•), 
to  determine  the  statistical  properties  of  the  coeffi¬ 
cients  in  the  optimized  or  more  precisely  the  class  of 
“  fix)”  which  leads  to  the  minimization  of  a  given 
functional. 


3  Statistical  Analysis 

The  majorization  approach  may  be  viewed  as  a  uni¬ 
fying  framework  which  provides  the  necessary  theo¬ 
retical  justifications  for  all  previously  proposed  BB 
criteria  (e.g.  the  entropy  criterion),  and  which  equips 
one  with  the  theoretical  underpinnings  and  insight 
for  other  extensions.  This  indeed  paves  the  way  for 
a  plethora  of  other  possible  search  principles  aimed 
at  reflecting  characteristics  other  than  parsimony  for 
instance[8]. 

Recall,  however,  that  the  parsimony  of  representa¬ 
tion,  lies  at  the  heart  of  the  originally  proposed  cri¬ 
teria  [1],  and  various  heuristic/justifying  statements 
about  the  distributions  of  wavelet  coefficients  were 
presented. 

Proposition  1.  Any  order  preserving  continu¬ 
ous  functional  X{-)  satisfying  the  above  (convex¬ 
ity/concavity)  properties,  and  which  when  optimized 
leads  to  a  BB  selection  of  a  signal  y{t),  results  in 
an  overall  density  function  f{x)  of  the  coefficients 
which  is  at  least  o{x°‘)  as  x  oo  (i.e.  decreases  at 
least  at  a  linear  rate). 

Proof  Concentrating  on  a  general  and  to  be  specified 
functional  !(•)  in  Eq.  1,  with  the  constraints  on  f{x) 
to  be  a  valid  PDF  and  on  the  coefficients  to  have 
finite  moment,  we  may  (e.g.)  write  the  following, 

Jmin{x,  fix))  =  miny^(„)  Xifix))+  (2) 

{iZo  x°‘f{x)dx  fix)dx  -  1)  }  • 

Using  standard  variational  techniques  of  optimization 
[11]  to  find  the  stationary  point  of  //'(•,  ■)  the  following 
results, 

6J  =  If(,)if{x))+\ix^ +  X2  =  Q,  (3) 

where  denotes  a  differentiation  with  respect 

to  /(•).  The  functional  !{■)  being  concave /convexe, 
leads  to  a  decreasing/increasing  J/(3.)(-).  Using  the 
following  standard  theorem  on  monotone  increas¬ 
ing/decreasing  functions. 
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Theorem  1.  Let  G  :  D  W  he  strictly  increas¬ 
ing  (or  decreasing)  on  D.  Then  there  exists  a  unique 
inverse  function  G~^  which  is  strictly  monotone  in¬ 
creasing  (or  decreasing)  on  f(D), 

we  conclude  that  we  have  an  increasing/decreasing 
inverse  function  everywhere,  except  possibly  at  a  fi¬ 
nite  set  of  points,  or 

with  the  Ai’s  ensuring  the  properties  of  f{x).  ■ 


3.1  Criteria 


3.1.1  Entropy: 

The  entropy  criterion  first  proposed  in  [1]  is 
{I{x)  =  — xlogx). 


Figure  1:  Continuous  Lorenz  Curve 

the  “uniform-indicating”  curve  and  that  indicating 
more  concentration,  or. 


Property  1.  X{f{x))  =  f{x)logf{x)  is  a  convex 
functional  of  f{x). 

Proof:  This  can  easily  be  seen  by  taking  the  second 
derivative  w.r.t.  f{x)  and  noting  that  /(x)  is  noneg¬ 
ative.  ■ 

Using  the  approach  described  above,  one  can  simply 
derive  the  maximizing  density  as 

f{x)  =  exp  {Ai  4-  A2  i  X  I  -1-1},  (4) 

which  when  using  Definition  1  for  the  BB  search,  also 
leads  to  the  minimization  of  the  entropy  of  the  result¬ 
ing  representation. 


=  I  F{x)d^x)  -  J  #(x)dF(x), 


(7) 


leading  once  again  to  the  following  optimization  prob¬ 
lem, 


inin  J  (x,/(x))  =  min  {X6(/(x))  -I- 

f{x)  f(x) 

j  f  {x)dx  -  X2  x/(x)dx-^^| 


Using  techniques  from  calculus  of  variations  [11],  this 
criterion  may  be  “extremized”  (  maximize  Tbv))  to 
solve  for  the  class  of  /(x),  which  can  be  solved  after 
much  algebra.  Instead  we  can  use  the  method  of  the 
Legendre  transform  which  is  precisely  constructed  us¬ 
ing  the  distance  between  “A”  and  “B”[ll], 


3.1.2  Lorenz  Criterion: 

L{p,F)=pF-^{F) 

(8) 

In  studying  the  spread  of  components  of  a  vector,  one 
might  consider  looking  at  the  center  of  mass  and  at 
its  variation  as  a  function  of  x.  Let  us  define 

rx 

which  will  achieve  an  extremum  for  dL/dF 
d^/dF  =  p  which  can  be  rewritten  as, 

d#  ,dF 

=  0  or 

F{x)  =  /  f{u)du 

J  —00 

(5) 

or  for  p  =  1, 

$(x)  =  —  f  uf{u)du, 

J  —00 

(6) 

pOO 

X  =  p  =  1  xf{x)dx, 

J  —00 

(9) 

where  we  recognize  in  #(x)  the  “local  center”  of  grav¬ 
ity  (or  local  mean)  and  in  F{x)  the  cumulative  pop¬ 
ulation  or  the  probability  at  a  point  x.  The  graph  of 
the  former  versus  the  latter  coincides  precisely  with 
the  Lorenz  curve  [10]  shown  in  Fig.  1  which  also 
forms  the  basis  of  Gini’s  concentration  criterion[9]. 
The  lower  curve  “B”  is  more  concentrated  than  curve 
“A”  which  clearly  represents  a  more  uniform  distri¬ 
bution  of  the  coefficients.  In  this  case,  the  goal  is  to 
maximize  the  distance  (or  the  area  enclosed)  between 


leading  to  the  fact  that  /(x)  must  necessarily  be  de¬ 
creasing  much  more  rapidly  than  x.  ■ 

Our  analysis  results  in  a  rigorous  solution  stating  that 
the  class  of  distributions  which  lead  to  the  extrema  of 
the  criteria,  is  of  polynomial/exponential  decay.  This 
is  a  significant  result  in  its  own  right,  since,  to  the 
best  of  our  knowledge,  it  is  the  first  rigorous  proof 
whose  result,  not  surprisingly  corroborates  with  the 
appealing  and  heuristic  notion  of  energy  concentra¬ 
tion,  and  which  has  been  the  basis  of  all  previously 
proposed  algorithms. 
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4  Applications 

The  appeal  of  this  result  is  twofold: 

1.  It  provides  a  strong  theoretical  argu¬ 
ment/justification  for  previously  proposed 
BB  search  criteria 

2.  It  provides  insight  for  further  improving  BB 
searches,  particularly  in  noisy  environments 

In  particular,  these  results  can  be  turned  around  to 
specify  one  of  the  properties  of  an  exponential  distri¬ 
bution  which  is  known  to  be  “optimal” ,  as  the  crite¬ 
rion  of  optimization.  More  specifically,  we  may  use 
the  “shape  factor”  of  the  density  f{x)  which  can  be 
viewed  as  a  robust  global  measure,  less  prone  to  vari¬ 
ability  in  the  presence  of  noise.  The  shape  factor  can 
be  evaluated  in  the  Maximum  Likelihood  sense  for 
the  WP  tableau  for  instance,  and  used  to  efficiently 
prune  the  binary  tree  to  result  in  a  BB.  In  contrast  to 
recently  proposed  algorithms,  we  avoid  to  explicitly 
use  the  (perhaps)  strong  a  priori  assumption  of  nor¬ 
mality  of  the  noise,  and  our  criterion  here  is  obtained 
by  proceeding  “in  reverse”  (i.e.  in  light  of  the  dis¬ 
tribution  properties  of  the  “optimal”  representation, 
we  optimize  the  intermediate  distributions  in  order  to 
achieve  it).  Similarly,  the  second  criterion  analyzed 
above  is  used  as  a  measure  of  the  distribution  of  the 
coefficients  on  the  tree  and  optimized  to  achieve  a 
BB. 

In  Fig.  2,  we  show  for  illustration  the  histograms  of 
a  typical  signal  (ramp  signal)  in  noise  and  that  of 
resulting  BB  coefficients. 

Acknowledgement:  Thanks  are  due  to  Dr.  J-C 
Pesquet  for  comments. 
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